Molecular Systems Biology — Latest Matching Preprints

1

Curating MitoCore: A Standardized Small-Scale Human Metabolic Model as Platform for Proteomics Integration and Disease Modeling

Lange, E.; Santamaria, A. B. R.; Heyer, R.

2026-07-09 systems biology 10.64898/2026.06.29.734258 medRxiv

Top 0.1%

22.6%

Show abstract

MotivationCentral human metabolism powers cellular processes, yet its dysregulation in disease remains poorly understood. While comprehensive genome-scale metabolic models like Human-GEM are available, their size limits interpretability and computational efficiency. Conversely, the smaller MitoCore model is more manageable but lacks the standardized annotations and curated gene-protein-reaction (GPR) associations necessary for omics integration like protein-constrained modeling. Improving MitoCores annotation quality is therefore essential for its use in integrative workflows. ResultsWe systematically updated MitoCore to enhance compatibility with the protein-constrained modeling framework sMO-MENT. By restructuring legacy annotations and integrating data from Human-GEM and MitoMammal, we increased EC-codes from 354 to 593 and UniProt-annotated genes from 0 to 592. MitoCore captures central metabolic processes, confirmed by mapping its reactions to 51 of 106 metabolic KEGG modules. Integration of thrombocyte proteomics and experimental ATP data for original and curated models showed an increase in mapped proteins (228 to 294) and reactions with kcat values (295 to 310), adding 43 protein-constrained reactions. Consequently, prediction errors for exchange fluxes and ATP production decreased by 19% and 88%, respectively, with 100% of ATP predictions falling within the 95% confidence interval (compared to 16% for the original model). Finally, we implemented a continuous integration/continuous deployment pipeline for automated updates from future Human-GEM releases. These improvements provide a computationally efficient, well-annotated model for studying central metabolism across human cell types. Availability and ImplementationAll source code for reproducing results from this paper is available at https://doi.org/10.5281/zenodo.20813825.

2

Shield-4i: A Whole-mount Multiplexed Imaging Platform for Studying Multiscale Information Flow in 3D Multicellular Systems

Hornbachner, R.; Shamipour, S.; Arslan, F. N.; Fan, R.; Hess, M.; Curvaia, F.; Lüthi, J.; Oates, A. C.; Bedzhov, I.; Gilmour, D.; Uhlmann, V.; Pelkmans, L.

2026-07-09 systems biology 10.64898/2026.07.01.735871 medRxiv

Top 0.1%

15.3%

Show abstract

Self-organization in multicellular systems emerges from reciprocal interactions across spatiotemporal scales. Understanding how subcellular organization, tissue remodeling and developmental outcome are coordinated, thus requires simultaneous profiling of biological processes spanning orders of magnitudes in space and time. Yet, a unified experimental and computational framework for capturing these multiscale properties across in vivo and stem cell-derived systems has been lacking. Here, we introduce Shield-4i, a high-throughput, versatile, and accessible method for automated in toto iterative immunofluorescence imaging of whole-mount structures at subcellular resolution. Through polyepoxide-mediated inter- and intramolecular crosslinking, Shield-4i preserves sample integrity during repeated SDS-based elution cycles. We benchmark this method in gastrulating zebrafish and post-implantation mouse embryos and demonstrate its applicability to stem cell-derived 3D gastruloids, achieving up to 30-plex measurements of proteins and their post-translational modifications across hundreds of samples. To enable scalable analysis, we developed a dedicated 3D workflow supporting OME-Zarr-based and FAIR-compliant data storage, standardized processing, and multiscale feature extraction. Applying this framework to investigate gastruloid self-organization, we quantify how cellular physicochemical state and signaling properties encode cell position along embryonic axes and connect molecular patterning and fate decisions to morphological symmetry breaking at the multicellular scale. Together, Shield-4i provides a high-content in toto spatial proteomics platform for dissecting multiscale information flow and self-organization in multicellular systems.

3

Machine Learning Gap-Fills Missing Transporter Kinetics in Biosystems Across Scales

Qiu, S.; Guo, Z.; Tu, W.; Zhuang, Y.; Wu, S.; Wang, G.

2026-07-07 systems biology 10.64898/2026.07.02.735998 medRxiv

Top 0.1%

15.1%

Show abstract

Understanding transporter kinetics is essential for deciphering metabolite exchanges in biosystems, particularly for cells subject to substrate gradients. Nevertheless, the prediction of transporter kinetic parameters, maximum rate per gram protein (Vmax) and Michaelis-Menten constant (Km), has not yet been tackled. Here, we developed the first compound-protein interaction machine learning model of transporter Vmax and Km, MMTKPred, which achieved R2=0.553, RMSE=1.155 mmol/hr/g Protein and R2=0.330, RMSE=0.935 mM for log10-scaled Vmax and Km prediction, respectively. Moreover, we demonstrated MMTKPred's predictive power across biosystem scales, from capturing transporter kinetics modulated by point mutations and substrate changes at the molecular level, to enabling substrate-sensitive metabolic modelling of non-model yeasts at the cellular level, and rationalizing inter-species substrate competition in co-cultures. Collectively, MMTKPred effectively models metabolite transport spanning from molecular to multi-species scales, thereby offering a computational tool for rational microbial cell factory optimization.

4

ChemoTrack: A comprehensive dataset linking single-cell migration trajectories to precisely defined chemotactic signals

Panigrahi, D.;Sakurai, N.;Mijanovic, L.;Versluis, D.;Tweedy, L.;Pearce, P.;Machesky, L.;Insall, R.

2026-06-29 Cell Biology 10.64898/2026.06.28.734951 medRxiv

Top 0.1%

12.1%

Show abstract

Chemotaxis drives cell migration in processes ranging from wound healing to embryonic development and cancer metastasis, yet its quantitative understanding remains limited because responding cells change and degrade attractant gradients, and existing datasets are too small and imprecise to capture stochastic behaviour. We present ChemoTrack, a publicly accessible resource comprising 2 million measurements from 500,000 migration tracks, in which the chemoattractant gradient and concentration experienced by every cell at the time of observation are precisely determined. The dataset includes microscopy images and trajectories spanning a full range of biologically relevant chemotactic conditions. Analysis shows that cells steer according to absolute differences in active receptor number, not fractional receptor occupancy, and maximal chemotaxis is not predicted by half-maximal receptor occupancy. By combining scale, precision and accessibility, ChemoTrack shifts quantitative description of eukaryotic chemotaxis from experimental conditions to the instantaneous chemical signal experienced by individual cells, enabling future mathematical and mechanistic analyses.

5

Stability-driven multi-omics integration for reproducible latent structure

Guan, H.; Gerwen, M. v.; Kim-Schulze, S.; Colicino, E.; Dolios, G.; Petrick, L.

2026-06-27 bioinformatics 10.64898/2026.06.23.734064 medRxiv

Top 0.1%

10.7%

Show abstract

High-dimensional multi-omics data integration offers novel opportunities to characterize complex biological systems. Even though sampling variability frequently compromises findings, particularly in small cohorts, the reproducibility and generalizability of the derived latent structures are insufficiently evaluated. We propose a Stability-driven framework for multi-omics integration that combines sparse generalized canonical correlation analysis with repeated cross-validation, out-of-sample projection, and systematic evaluation of both component-level and feature-level stability. We apply this framework to untargeted metabolomic and Olink targeted inflammation proteomic profiles in a thyroid cancer case-control cohort (n = 162). Our Stability-driven integration identified reproducible metabolomic and proteomic latent components that showed consistent out-of-sample disease associations and tracked temporally structured changes relative to time to diagnosis. The proposed framework provides a generalizable strategy for identifying reproducible latent structures that improve robustness of biological inference in multi-omics studies.

6

Learning proteomic disease trajectories with flow matching

Hartman, E.; Karlsson, C.; Malmström, J.

2026-07-13 bioinformatics 10.64898/2026.07.08.737311 medRxiv

Top 0.1%

10.0%

Show abstract

High-throughput proteomics has enabled detailed characterization of molecular states across health and disease. However, biological systems are inherently dynamic and methods for reconstructing continuous proteome changes remain limited. Here, we introduce proteome velocity, a framework for inferring continuous proteome trajectories from cross-sectional or sparsely sampled proteomics data using flow matching, in which a neural network learns velocity fields over proteome space. Proteome velocity estimates how rapidly and in which direction protein abundances change along a biological progression, such as disease. In mouse sepsis, covariate-conditioned velocity models resolved tissue- and pathogen-specific proteome trajectories and identified inflammatory proteins with distinct temporal activation patterns across infection routes and organ systems. In clinical COVID-19 plasma proteomes, inferred trajectories separated into distinct velocity programs associated with disease severity. These results show how generative trajectory models can transform cross-sectional proteomics data into interpretable, protein-resolved representations of molecular progression.

7

Adenylyl cyclases combinatorially integrate opposing dopamine receptor signals

Gregrowicz, J.; Elowitz, M. B.

2026-07-13 systems biology 10.64898/2026.07.10.737756 medRxiv

Top 0.1%

9.8%

Show abstract

Dopamine receptors are divided into two families which exert opposing effects on the second messenger cyclic AMP (cAMP). While most neuronal cell types express a single receptor subtype, some neurons co-express opposing receptor subtypes. It remains unclear how these cells could resolve simultaneous stimulatory and inhibitory inputs. Here, we introduce a multiplexed assay that quantifies surface receptor abundance and dynamic cAMP output in single cells. Using this assay, together with mathematical modeling, we demonstrate that signals from opposing receptor subtypes are integrated flexibly by downstream adenylyl cyclases (ACs) rather than at the receptor level. Because AC isoforms exhibit unique biochemical properties, a cells AC expression profile determines whether conflicting inputs are cancelled, suppressed, or amplified. Brain transcriptome analysis indicates that co-expression of opposing dopamine receptors is associated with expression of specific AC isoforms predicted to sustain signaling during multi-receptor activation. Our results show that dopamine signal integration depends on the expression profiles of receptors and AC isoforms in a predictable way.

8

Clinical Trial and Ontology-Derived Positive and Negative Benchmark Datasets for Drug Repurposing Across Rare Diseases

Ravandi, C. B.; Mowrey, W.; Chatterjee, A.; Khanshan, F.; Haddadi, P.; Mobarec, J. C.; Lambden, S.; Eliassi-Rad, T.; Ricchiuto, P.; Risa, G.

2026-07-08 systems biology 10.64898/2026.06.15.732135 medRxiv

Top 0.1%

9.6%

Show abstract

Evaluating the potential applications of a medicine is a fundamental challenge in drug development. There is a lack of standardized, decision-oriented benchmarks that test whether computational models can generalize therapeutic hypotheses across diseases in ways that reflect real-world pharmaceutical investment decision making. To address this gap, we introduce two complementary resources: the Indication Expansion Investment Decision Network (IxIDN) and the Orphanet Rare Disease Ontology Negative-network (ORDON). IxIDN is a clinical-trial-derived positive benchmark constructed by projecting drug-disease associations from pharmaceutical clinical trials into a disease-disease network; each edge connects disease pairs that have entered clinical trials for the same drug, thereby capturing cases when concrete indication-expansion decisions have been made. The current release contains 574 rare diseases and 5,336 edges. In contrast, ORDON serves as a stringent, biology-aware negative benchmark derived from the authoritative Orphanet Rare Disease Ontology. It identifies maximally distant disease pairs according to curated hierarchical structure and genetics-linked inheritance patterns, providing 793 rare diseases and 5,000 edges that represent high-separation negative candidates across therapeutic areas. Together, IxIDN and ORDON enable rigorous cross-evidence generalization from clinical trials to disease ontology, testing for Disease-Disease Association Learning (DDAL), a core task for mechanism-centered drug repurposing and indication expansion. All data are publicly available with detailed metadata, enabling reproducible evaluation of models on transparent, decision-relevant benchmarks.

9

Regulatory memory and growth-coupled inheritance shape nutrient-dependent flagella number variation in Salmonella

Barua, A.; Giralt-Zuniga, M.; Erhardt, M.; Hatzikirou, H.

2026-07-10 systems biology 10.64898/2026.07.10.737737 medRxiv

Top 0.1%

9.5%

Show abstract

Bacteria must balance the advantage of movement against the cost of building flagella, yet how nutrient availability shapes variation in flagellar number across single cells remains unclear. Here, we combine time-resolved basal-body measurements in Salmonella enterica with a mechanistically constrained stochastic model of flagellar remodeling. The model separates two routes from nutrient availability to flagellar number: an RflP-dependent regulatory memory that sets a synthesis target via a latent sensing-memory variable, and a physical inheritance process in which synthesis, binomial partitioning, and division reshape the flagellar-number distribution. Coarse-graining this process yields leaky-integrator dynamics in which the mean flagellar number tracks a regulatory target, while the latent correlation follows acquisition-decay dynamics. In wild-type cells, nutrient-dependent acquisition raises the target and increases flagellar investment; in {Delta}rflP cells, loss of acquisition produces a transient overshoot that isolates the intrinsic decay (memory) timescale of the regulatory state. The fitted model predicts a held-out nutrient condition and reveals which parameter combinations are identifiable from the data. Analysis of the fitted dynamics suggests that precision is tuned primarily by sensing-dependent signal amplitude, rather than integration time, with an apparent [~]1.7-fold increase in effective wild-type noise amplitude after mean normalization, consistent with a precision cost of active regulation relative to the mutant. A model-free, information-geometric (Cramer-Rao) speed limit further shows that active remodeling approaches the statistical speed allowed by the observed distributional variability. Together, these results reveal how Salmonella cells couple regulatory memory with growth-dependent inheritance to record recent nutrient history in their flagellar number.

10

Tabular Foundation Models Are Competitive Cellular Perturbation Predictors Across Biological Scales

Palla, G.; Hillsley, A.; Kim, Y.-J.; Royer, L. A.

2026-07-01 bioinformatics 10.64898/2026.06.28.735106 medRxiv

Top 0.1%

9.5%

Show abstract

Predicting how cells respond to genetic and chemical perturbations is a central challenge in drug discovery and functional genomics. A growing ecosystem of specialized single-cell foundation models has been developed to address this problem, yet their practical advantage over domain-agnostic approaches remains unclear. Here we evaluate the power of Tabular Foundation Models such as TabICL and TabPFN, general-purpose pre-trained regression models, against domain-specific architectures including PRESAGE, scGPT, scLAMBDA, STACK and Prophet across four complementary evaluation settings: cell-level in-context cross-cell-type prediction, pseudobulk perturbation prediction on five Perturb-seq datasets of cell-lines, a genome-wide CRISPR screen in primary human CD4+ T cells, and embryo-level cell-type composition prediction in a zebrafish developmental perturbation atlas. In the cell-level cross-cell type perturbation prediction, Tabular Foundation Models perform on par or better than specialized models. On pseudobulk perturbation prediction, Tabular Foundation Models consistently outperform specialized baselines across multiple evaluation metrics and datasets. On whole-emrbryo cell-type composition prediction, Tabular Foundation Models are competitive with specialized baselines. These results demonstrate that general-purpose tabular in-context learning provides a strong and scalable alternative to bespoke biological architectures for perturbation response modeling across cell systems and scales.

11

Synthesizing Mechanistic Hypotheses from Single-Cell Omics via Discretized Feature Attribution and Empirical Language Model Grounding

Chen, J.; Hong, Y.; Bermudez, A.; Hu, J.; Hsieh, C.-J.; Lin, N.

2026-07-10 systems biology 10.64898/2026.07.09.737344 medRxiv

Top 0.1%

8.6%

Show abstract

Single-cell multimodal omics offer unprecedented resolution of cellular networks, yet translating continuous computational attributions into structured, testable biological mechanisms remains a persistent bottleneck. To address this limitation, we introduce an analytical pipeline employing decision trees to discretize continuous neural network attributions into explicit regulatory thresholds. These boundaries then structurally constrain large language models, enabling them to integrate established literature with empirical data to synthesize context-specific hypotheses. Applying this continuous-to-discrete framework across sparse datasets yielded novel biological mechanisms. Specifically, the framework articulated a cytoskeletal gating hierarchy governing EGF-stimulated pathways, identified transcriptomic drivers of input resistance in cortical interneurons, and delineated translational logic predicting Ki-67 abundance within spatial transcriptomics. Retrospective benchmarking validated the capacity of the framework to autonomously reconstruct published regulatory logic. Supported by a locally deployable open-weight language model and a code-free interface, this approach establishes an auditable methodology to extract robust experimental hypotheses from high-dimensional single-cell data.

12

Population-scale molecular reconstruction of human circadian phase from blood biomarkers

Albinana, C.; Richmond, R.; Wang, B.; Urpa, L.; Crouse, J.; Zeng, Y.; Rosoff, D.; Abdi, S.; FinnGen Consortium, ; Li, L.; Chen, Z.; Millwood, I. Y.; Ollila, H. M.; Hickie, I.; Gachon, F.; Kramer, A.; Ray, D.; Wray, N.

2026-07-13 genetic and genomic medicine 10.64898/2026.07.08.26356418 medRxiv

Top 0.2%

7.7%

Show abstract

Circadian timing influences human physiology and disease risk, yet scalable measures of molecular circadian phase are lacking. Here we infer circadian phase from circulating blood biomarkers in UK Biobank. Among 3,228 plasma biomarkers, 58% exhibit significant diurnal variation, with harmonic modeling identifying acrophase clustering consistent with canonical circadian patterns and independent constant-routine datasets. Machine-learning models trained on plasma proteomics predict sampling time (R2=0.68) and retain substantial accuracy with ~60 proteins. We define a novel construct, circadian acceleration (CA), as deviation from the population-average phase; CA is temporally stable, associates with chronotype and shift work, and responds to environmental perturbation. CA is heritable (h2SNP=0.10) and genetically correlated with chronotype and accelerometry-derived sleep traits. These results establish plasma proteomics as a scalable approach for population-level molecular circadian phenotyping.

13

A calibrated temporal reference map of disease progression

Tian, J.; Azhir, A.; Hugel, J.; Patel, C.; Estiri, H.

2026-06-29 public and global health 10.64898/2026.06.24.26356443 medRxiv

Top 0.2%

7.3%

Show abstract

Background. Understanding the evolution of human illness requires capturing the temporal direc- tionality of disease progression, yet existing biomedical reference maps largely describe cross-sectional states or static comorbidity. We introduce a directed, probability-ranked map (i.e., a knowledge-base) of clinical progression derived from population-scale longitudinal electronic health records. Methods. The knowledge-base was constructed from de-identified EHRs of 295,678 individuals across the Mass General Brigham system, yielding 435,240 phenotype-pair-duration associations via temporal Spearman correlation. To distinguish biological progression from administrative artefact at scale, we distilled a locally deployed MedGemma labeling function into two complementary classifiers: a RF capturing local episodic signal and a GNN aggregating global network topology via message passing. Their outputs were combined as an unweighted late-fusion average. Classifier confidence was systemati- cally evaluated against pairwise genome-wide genetic correlation estimates from the UK Biobank as an independent biological reference standard. Results. Both classifiers achieved comparable distillation fidelity on the 200-row development set (RF AUROC 0.772; GNN AUROC 0.769). Genetic support was concentrated in the highest confi- dence deciles, with both models achieving highly significant top-decile enrichment for validated genetic pleiotropy (RF: 1.36-fold, p < 0.001; GNN: 1.32-old, p < 0.001), demonstrating that classifier confi- dence aligned with independent genomic support. The framework additionally identified two comple- mentary classes of progression: acquired mechanical cascades with high classifier confidence but null genomic overlap (exemplified by musculoskeletal pain progressing to cardiac dysrhythmias beyond 90 days, Pavg = 0.984, rg = 0.046, prg = 0.539), and topological bridges structurally enforced by network architecture despite sparse local co-occurrence (exemplified by acute myocardial infarction to epilepsy within 0-14 days, PGNN = 0.930 versus PRF = 0.332). Conclusions. By transitioning from static comorbidity networks to a confidence-ranked landscape of temporal trajectories, the map provides a biologically calibrated coordinate system for prioritising mechanistic, translational, and clinical investigation of disease progression.

14

Scop3P in 2026: an expanded proteomics-informed resource contextualizing phosphorylation sites through sequence, structure, mutation, and experimental provenance

Ramasamy, P.; Tichshenko, N.; Diaz, A.; Velghe, K.; Massignani, E.; Vranken, W. F.; Martens, L.

2026-07-06 bioinformatics 10.64898/2026.07.03.736340 medRxiv

Top 0.2%

7.2%

Show abstract

Protein phosphorylation is a central regulatory mechanism controlling protein activity, interactions, and cellular signalling, and its dysregulation is implicated in numerous diseases. Advances in mass spectrometry--based phosphoproteomics have led to a rapid expansion in the number of reported phosphorylation sites; however, interpretation of these data remains challenging due to fragmented evidence, limited structural context, and the lack of uniform experimental provenance across resources. Interpretation is further complicated by the fact that the biological meaning of reported phosphosites can vary substantially across tissues, cell lines, perturbations, and disease settings. Here, we present a major update of Scop3P, a proteomics-informed knowledgebase that contextualizes human phosphorylation sites within integrated sequence, structural, biophysical, evolutionary, and mutational frameworks. The current release incorporates uniformly reprocessed human phosphoproteomics data from 116 PRIDE datasets alongside curated UniProt annotations, retaining peptide-spectrum matches, site localization confidence, and direct links to primary mass spectrometry evidence via Universal Spectrum Identifiers. This integration yields 152,350 unique serine, threonine, and tyrosine phosphorylation sites across 16,533 human proteins, supported by full experimental provenance. Beyond site identification, Scop3P provides residue-level contextual annotations derived from experimentally determined protein structures and proteome-wide AlphaFold models, enabling near-complete structural coverage of phosphorylation sites. Structural context is further complemented by residue-level biophysical, evolutionary, and mutational annotations, supporting integrated assessment of phosphorylation in functional and disease-related settings. The current release also introduces residue interaction network representations derived from AlphaFold-predicted structures, capturing spatial connectivity and local interaction environments of phosphorylation and mutation sites. A redesigned web interface enables interactive exploration through coordinated 1D, 2D, 2.5D, and 3D visualizations, peptide-level coverage views, and direct access to original spectra via PRIDE. By bridging experimental phosphoproteomics with structural, functional, and disease-related context, Scop3P provides a scalable and provenance-aware resource for phosphosite interpretation, hypothesis generation, and data-driven modelling of phosphorylation-dependent regulation.

15

Dynamic Patterns of Nuclear Transcription Factor Abundance in Plant Basal Immunity Revealed by Spatial Proteomics of Arabidopsis Nuclei

Ayash, M.; Proksch, C.; Thieme, D.; Bauer, N.; Lee, J.; Heilmann, I.; Hoehenwarter, W.

2026-07-09 plant biology 10.64898/2026.06.30.735533 medRxiv

Top 0.2%

6.7%

Show abstract

O_LIThe control of amount of nuclear proteins is fundamental in regulating plant gene expression, but the mechanisms of quantitative dynamics of the nuclear proteome are largely unstudied during adaptive responses to pathogens. C_LIO_LIHighly specific labeling, enrichment and measurement of the nuclear proteome was performed using TurboID LC-MS of Arabidopsis thaliana leaves treated with the pathogen-associated molecular pattern (PAMP), flg22, and/or cycloheximide. The chosen experimental approach allowed discrimination of the effects of translation, nuclear protein import, trafficking of preexisting proteins, derepression, and nuclear protein turn-over upon elicitation of basal immunity. C_LIO_LIThe highly specific, deep coverage of proteins in the nucleus makes this study a resource for anyone interested in plant nuclear proteome dynamics and defense. C_LIO_LIAround 2,000 nuclear proteins were repeatedly quantified, including more than 300 transcription factors or other proteins related to transcription. Several proteins with documented activity in endosomes were newly synthesized and imported into nuclei upon PAMP challenge, suggesting alternative nuclear functions in PAMP-triggered immunity (PTI). Circadian clock components, including the transcription factor, CIRCADIAN CLOCK ASSOCIATED 1 (CCA1)-HIKING EXPEDITION (CHE), were depleted upon PAMP challenge, suggesting a safeguard against untimely induction of systemic acquired resistance (SAR). C_LIO_LIBased on proteomic patterns, proteins moonlighting in the nucleus as well as trafficking and turn-over regulation of the proteome are common elements during plant immunity. C_LI

16

Deep dynamical models of single-cell multiomic velocities predict loss-of-function and rescue perturbations in B cells

Karbalayghareh, A.; Pelzer, B.; Chin, C. R.; Melnick, A.; Barisic, D.; Leslie, C. S.

2026-07-08 systems biology 10.1101/2025.04.24.650458 medRxiv

Top 0.2%

6.7%

Show abstract

We present DynaVelo, a generative neural ordinary differential equation model that learns the joint dynamics of gene expression and transcription factor (TF) motif activities in evolving cell systems using single-cell multiome with joint gene expression and chromatin accesibility readout. DynaVelo leverages partial RNA velocity information together with single-cell TF motif accessibility data to improve the modeling of cell state dynamics and identification of TF drivers. We show that DynaVelo recovers the complex and bifurcating in vivo dynamics of wildtype murine germinal center (GC) B cells and reveals how these cell dynamics change under loss-of-function mutations in epigenetic regulators Arid1a and Ctcf. DynaVelo resolves how TF motif activities evolve along latent time trajectories using analysis of training cells or through generated trajectories from the model. In silico perturbation analysis further enables DynaVelo to infer dynamic and cell-state-specific gene regulatory networks (GRNs), recovering many known TF-to-gene edges in the wildtype GC GRN and predicting those that are disrupted in mutants. Finally, in silico gene and TF perturbations allow both the prediction of cell dynamics under loss-of-function genetic mutations and the identification of TF perturbations to rescue loss-of-function dynamic and immunological phenotypes. This analysis predicted that Ctcf knockout would rescue Arid1a loss-of-function phenotype in the GC reaction and nominated Bcl6 and Stat3 as additional TFs whose knockout would rescue Arid1a loss. We validated these predictions in vivo using double heterozygous mutant mice, confirming rescue of the Arid1a dark zone phenotype in all cases and quantitatively assessing model predictions using multiome in Arid1aHet;CtcfHet double heterozygous mice. DynaVelo therefore provides a powerful new deep learning framework for modeling and perturbing dynamic cell systems by harnessing single-cell multiome data sets.

17

Learning the Cellular Dynamics as a Port-Hamiltonian System

Sigdel, D.; Panday, N.

2026-07-13 cell biology 10.64898/2026.07.11.737972 medRxiv

Top 0.2%

6.7%

Show abstract

We present a composite, compartmental, multi-clock port-Hamiltonian model of cell dynamics learned by a graph-neural-network surrogate. The state pairs abundance deviation qj, the quantity omics assays measure, with oscillatory phasors derived only for pools a rhythmicity gate certifies as periodic. The storage function decomposes over five functional compartments (core clock, redox, energy, signalling, biosynthesis), so passivity is certified compartment by compartment, and the model carries two mechanistically distinct clocks coupled through a zero-net-power signalling port, with the central dogma hard-wired and conserved moieties held as exact invariants. We evaluate it on a real mouse-liver tri-omic circadian dataset assembled from public repositories and report a deliberately mixed verdict. The trained model is passive ([Formula], no violations over three seeds), forecasts held-out trajectory segments (RMSE 0.324 {+/-} 0.0004), and recovers withheld regulatory edges above a permuted null (AUROC 0.94 {+/-} 0.01). Its central prediction -- that cross-omic phase lags follow {Delta}{varphi} = arctan({omega}/kdeg) -- matches the aggregate transcript-to-protein lag measured independently (5.7 vs 4.9 h) but not the gene-to-gene variation, and the internal two-clock cascade is not scoreable on the available cross-cohort metabolome. The framework thus gives a falsifiable, thermodynamically-grounded account of cell dynamics with explicit limits.

18

A Spatial Agent-Based Model of AGE-RAGE Feedback in Hepatic Fibrosis Reveals Stage-Dependent Irreversibility Thresholds

Eskridge, W.

2026-07-13 systems biology 10.64898/2026.07.08.737277 medRxiv

Top 0.2%

6.6%

Show abstract

Hepatic fibrosis progression involves a well-characterized but computationally unmodeled feedback loop: Advanced Glycation End-products (AGEs) accumulate on permanent collagen via Maillard chemistry, activate the Receptor for Advanced Glycation End-products (RAGE) on hepatic stellate cells (HSCs) and Kupffer cells, drive NF-{kappa}B-mediated HSC activation and anti-apoptotic signaling, and deplete soluble RAGE (sRAGE) through hepatocyte loss -- creating a closed positive feedback loop. To our knowledge, we present the first spatial agent-based model incorporating the complete AGE-RAGE-sRAGE axis in a three-dimensional GPU-accelerated liver tissue simulation. The model produces three key findings: (i) stage-dependent irreversibility thresholds emerge without explicit stage-gating, with resolution declining from [~]68% at F2 to <10% at F4; (ii) sRAGE trajectories diverge at F2-F3: recovering during abstinence from F2 (0.68 [->] 0.87) but remaining depleted from F3 (0.54), predicting a clinically testable biomarker transition; and (iii) RAGE-driven HSC activation becomes self-sustaining at F3+ independent of exogenous injury, explaining why late-stage fibrosis resists resolution despite removal of the primary insult. No prior computational model -- ODE, PDE, or agent-based -- has formalized the complete RAGE-AGE-sRAGE feedback loop in hepatic fibrosis. The sRAGE divergence prediction is independently testable in clinical cohorts.

19

GLproxScape reconstructs spatial chromatin occupancy landscapes from tiled genomic locus proteomics

Ozcan, S. C.; Sergi, B.; Yildirim, B.; Cagiral, U.; Gonen, M.; ACILAN AYHAN, C.

2026-07-03 bioinformatics 10.64898/2026.06.29.735243 medRxiv

Top 0.2%

6.5%

Show abstract

Genomic locus proteomics combines proximity labeling with mass spectrometry to identify the proteins associated with user-defined genomic loci. However, per-region enrichment values from tiledguide designs are typically pooled before hit calling, collapsing the latent spatial structure encodedby overlapping measurements. Here, we describe GLproxScape, an R package that treats per-region enrichments as indirect spatial measurements and reconstructs latent chromatin occupancylandscapes through a Gaussian labeling-kernel forward model. Sequence-specific transcriptionfactors are resolved by motif-anchored non-negative least-squares deconvolution against JASPARor HOCOMOCO position weight matrices, while chromatin regulators which lack defined DNA-binding motifs are inferred as broad occupancy zones, enabling recovery of overlapping membersof multi-subunit complexes. Applied to published genomic locus proteomics datasets at the humanTERT, MYC, FOXP2, and FOXQ1 loci and the mouse Ripk3 locus, GLproxScape recovered knownregulators with predicted positions independently supported by ChIP-Atlas peaks, reconstructedcandidate co-binding relationships, and identified chromatin complexes inaccessible to pooledanalyses. Systematic sgRNA-ablation experiments further showed that densely tiled designsimprove event recovery and positional stability, providing concrete experimental guidance for futuregenomic locus proteomics studies.

20

A meta-analysis resolves the huntingtin interactome into coactivator losses and a robust proteostatic and synaptic gain network

Seefelder, M.

2026-07-09 neuroscience 10.64898/2026.07.06.736704 medRxiv

Top 0.2%

6.3%

Show abstract

Transcriptional dysregulation and proteostatic collapse are cardinal yet mechanistically separate features of Huntington disease (HD), and how the polyglutamine (polyQ) expansion in huntingtin (HTT) rewires its interactome to produce both remains unresolved. We integrated four published HTT affinity-proteomics datasets and contrasted wild-type and polyQexpanded HTT within one Bayesian model (BayesInteractomics). Of 4,338 proteins, 275 were condition-dependent: the expansion strips HTT of the transcription-activation machinery (Mediator, the ASCOM H3K4-methyltransferase, CREBBP, CDK9) while gaining contacts with the 26S proteasome, HSP70 chaperones and a synaptic and actin-cytoskeletal network, around an intact chaperonin-HAP40 core. This picture emerges only from integration: the datasets overlap so little that a reproducible in at least 2 studies consensus would recover approximately 21% of the high-confidence interactors. By reconciling the transcriptional and proteotoxic arms of HD within one quantitative interactome, this loss-plus-gain model recasts two historically separate disease mechanisms as complementary and nominates prioritised interfaces (HTT-Mediator/ASCOM, HTT-proteasome) for validation and therapeutic targeting.